MPA 5830 - Module 05

Ani Ruhil

2018-08-02


Agenda

In this module you will learn how to build maps with ggplot2 and ggmap, and then with leaflet. There are other packages – see choroplethr, tmap, and sf – that we could use but I will leave it to you to explore these on your own if mapping interests you. leaflet is especially fun and useful for creating interactive maps. We will also look at two other packages – highcharter, to generate interactive plots/maps, and gganimate to animate plots built with ggplot2.

Mapping with ggplot2 and ggmap

Having worked with ggplot2 in the previous module we might as well stick to it since the coding syntax will be relatively familiar. Start by loading the following packages; remember to install them if you get a “package not found” error.

It is pretty easy to build a state or county map. I’ll focus on building a county-level map of the USA, just to show you how easy it is to build one. Then we can focus on Ohio per se. The code below shows you how to download the dataframe for all ccounties and then how to subset it to counties in Ohio.

## [1] "long"      "lat"       "group"     "order"     "region"    "subregion"
## [1] "long"      "lat"       "group"     "order"     "region"    "subregion"

You see six columns/variables in the usa data-frame. Pay attention to the contents of each, described below:

  • long = longitude, a measure of east-west position. The prime meridian is assigned the value of 0 degrees, and runs through Greenwich (England). Athens, Ohio has a longitude of -82.101255
  • lat = latitude, a measure of north-south position. The equator is defined as 0 degrees, the North Pole as 90 degrees north, and the South Pole as 90 degrees south. Athens, Ohio has a latitude of 39.329240
  • group = an identifier that is unique for each subregion (here the counties)
  • order = an identifier that indicates the order in which the boundary lines should be drawn
  • region = string indicator for regions (here the states)
  • subregion = string indicator for sub-regions (here the county names)

At this point, go ahead and open google maps in your browser and search for two places that have some meaning, some value to you – hometown, your actual residence, favorite vacation spot, and so on. When you find this place on the map, right-click and try to copy and paste the latitude/longitude you see reported for each place. Save these latitudes/longitudes since you will need them for some mapping exercises at the end of this module.

The actual map can be built as shown below.

But this is not very good because the county borders are incorrectly drawn. Why is that? Because we forgot to specify the grouping structure for drawing these lines (i.e., should all the latitudes/longitudes we see for each county be connected in order by county or should they be connected just by order?). Yes of course, they should connected in order by county, and this can be specified via the group = command, resulting in the county borders being drawn correctly.

We could improve upon this map by filling in each county; there are 3000+ counties so the colors will not be unique but that is okay for now. I have also switched the county borders to be inked in white.

What if we want to also draw the state borders?

Now, you could also do this with another package, urbnmapr so let us see it very briefly. Here is a map with state borders and then with county borders. Note that Alaska and Hawaii are visible in these maps but were conspicuously absent from the earlier maps drawn via ggplot2 and ggmap. Note that you may have to install it with the command shown below:

devtools::install_github(“UrbanInstitute/urbnmapr”

If you want to learn more about the urbanmapr package you should read about it here

Labeling points and other features

Okay, so much for basic maps. Now we drill down to Ohio counties alone and see how we can label them, add points to them, and then also use a color scheme to color them on the basis of some measure such as percent of children in poverty in the county, median household income, and so on.

In order to label counties we will need to find the center of each county and then use the county names to show up at this latitude/longitude. Note that taking the mean/median of latitude/longitude will not work so we use specific code to find the centroids. Unfortunately, county names will have to be formatted into titlecase since they show up as subregion with all lowercase characters.

We start by leaning on the stringr package to correct how names will be displayed. We then calculate the mid-point of each county and call the result centroids2.

Now we are ready to plot …

Note that geom_text() is used to add the county names to the map. The color = and size = commands are tweaking the font color and size. If I relied on geom_label() I would have a less effective map with needless borders around each county name.

I would also like to highlight where Ohio University’s Athens and Lancaster campuses are located. I know their latitudes/longitudes – Athens is 39.324391, -82.101443 and Lancaster is 39.738743, -82.586373. Let me use a red dot for each.

Using another variable to fill the map

Now, often you will see maps that use a color scheme to represent some spatial distribution, median household income, poverty, and so on. Well, we will build one such map by using the percent of children in poverty in each county in Ohio. The data were sent to you via slack so make sure the file is in your data folder. We will read it in and then build the map.

Now we merge this file with oh, noting that the merge key will be county since both files have this variable.

Now we start building the map.

That isn’t a bad map but we could do better, by creating quartiles (4 groups) or quintiles (5 groups) so that it is easier to pinpoint which county falls into the top 25%, bottom 20%, and so on of whatever measure it is that we grouped. The code below shows you how to generate the quintiles and map with them.

What if we wanted only three groups, say the top one-third, middle one-third, and then the lowest one-third?

Now, all of these maps have been limited to just the 48 mainland states; Alaska and Hawaii are absent. Can we fold these in? Yes we can, with the fiftystater package.

## [1] "long"  "lat"   "order" "hole"  "piece" "id"    "group"

leaflet maps

leaflet is an easy to learn a JavaScript library that generates interactive maps.It is fun and the possibilities are endless. You will need three libraries so make sure you install leaflet, leaflet.extras, and widgetframe. The basic command structure is as follows, you call leaflet via leaflet() and specify the latitude/longitude to be used to center the map, and the zoom factor to be applied. The higher the zoom number the more zoomed-in the view will be and the smaller the zoom number the more zoomed-out the view will be. The addTiles() command adds default tiles but you can tweak this (see the book chapter for examples). The setMapWidgetStyle() and the other commands that follow allow you to customize how the map will look in the knitted html document.

The one that shows up has been centered around Athens, Ohio and is using the default map-tiles. Note that you can zoom in/out witht he map, as well as move the map around so that you end up in some other place.

So far so good, now how about dropping a pin on Building 21 on The Ridges, the main administrative building of the Voinovich School? This is done with addMarkers(), with the popup = c() switch indicating what text should be displayedif the popup is clicked.

This, in a nutshell, is the basic setup of a leaflet map. There is tons more you could do so if interested, check out the several examples out there on the web, starting with this documentation. For now we see a few more extensions. Here, for instance, is how one might take a large data-set and display specific features. In particular, let us map some bike-share stations in New York City. The actual data-frame is rather large so we draw a random sample of 30 rows with the sample_n() command from dplyr. The map will show pop-ups that, if clicked, will display the location of the bike-share station.

Multiple plots on one canvas with patchwork

When you are building a visualization you often end up needing to squeeze multiple graphics into a single canvas. There are several ways to do it in R but I am showing you what may be the easiest way to do it – with patchwork. You may have to install it via devtools as shown below:

devtools::install_github(“thomasp85/patchwork”)

Start by loading patchwork and ggplot2 (and any other libraries you plan to use for the plots). Start by naming each plot; most of us end up naming them p1, p2, and so on (why? because those were the earliest examples on the web). Then decide on how you want the plots to show up. That is, how many plots do you have? Should they be side-by-side? If you have an odd-number of plots, should two be side-by-side and the third in a row below these two? Or the other way around? What plot should come first? What plot should be last? Let us create three plots and see how the package works.

Say I want two plots, each in its own column.

Hmm, maybe one per row?

You see plot_layout() used in the second plot command. Ithas several options, the key ones being

  • ncol, nrow: number of columns/rows
  • byrow: how should the plots be embedded, by filling columns first or by filling rows first?
  • widths, heights: relative widths/heights of each column and row in the grid. Will get repeated to match the dimensions of the grid.

Say I want to fill row 1 with p1, p2, then row 2 with p1, p2

What if I want to fill column 1 with p1, p2, then column 2 with p1, p2?

I can also use () to group sub-plots.

Note that the | specifies vertical layouts and the / specifies horizontal layouts

and then one can specify the heights/widths of each plot.

You can explore other settings here and if you want to see another package that tries to achieve similar results, explore cowplot here.

Interactive graphics with highcharter

highcharter is one of my favorite packages for dynamic plots because it builds them with ease and yet they are visually stunning (see below). This is advanced material so be warned.

The first plot is a heatmap using unemployment rates (value) in counties (code in countries/us/us-all-all).

Here is a scatterplot built with the epa.RData and keeping only 100 randomly sampled observations (to keep things manageable). Notice the specification scatter for chart-type, and the hcaes().

Here is a line chart using the unemployment rate data. I am extracting year so I can use it for the x-axis, and then calculating the average unemployment rate by year and education group (educ_group).

And then dressing up the highcharter plot with themes and some customization.

Animated graphics with gganimate

This is advanced material too so you have been twice warned! The gganimate library can be tricky to run without errors and hiccups because it needs other packages to be installed and configured; see here for details. To complicate matters, gganimate is being completely overhauled and the new version should be released in the next few weeks so be sure to check its documentation available here.

This code rebuilds a famous visualization a la Hans Rosling, coming close to at least capturing the spirit of Hans. Life expectancy is mapped for the continents by gross domestic product per capita, and across years. Each color represents a country within the continent, and the size of the bubbles is proportional to the country’s population size.


Practice Tasks

(1)

Create a map of the 48 contiguous states in the United State. Be sure to title the map and to fill in each state with colors while drawing state borders in white. Make sure you add state names by first calculating the centroids of each state and then merging these latitudes and longitudes with the map data. Use theme_map() and make sure the legend is not visible.

(3)

Use the original USArrests data to draw scatterplots of (a) Murder versus UrbanPop, (b) Assault versus UrbanPop, and (c) Rape versus UrbanPop. Save each of these scatterplots by name and then use patchwork to create a single canvas that includes all three plots. Make sure you label the x-axis, y-axis, and title each plot.

(4)

Now create highcharter versions of each of the three scatterplots you created in (3) above. You should end up with three scatterplots, each on its own canvas.

(5)

Use leaflet to create a map that includes a popup for your place of birth. You will need to use Google maps to find the latitude/longitude for this place. The popup should display the name of this place.